Overview

Dataset statistics

Number of variables15
Number of observations850
Missing cells2540
Missing cells (%)19.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory99.7 KiB
Average record size in memory120.2 B

Variable types

Numeric8
Categorical7

Alerts

name has a high cardinality: 849 distinct values High cardinality
ticket has a high cardinality: 660 distinct values High cardinality
cabin has a high cardinality: 135 distinct values High cardinality
home.dest has a high cardinality: 272 distinct values High cardinality
passenger_id is highly correlated with pclass and 1 other fieldsHigh correlation
pclass is highly correlated with passenger_id and 3 other fieldsHigh correlation
fare is highly correlated with pclass and 1 other fieldsHigh correlation
sex is highly correlated with survivedHigh correlation
embarked is highly correlated with pclass and 1 other fieldsHigh correlation
boat is highly correlated with passenger_id and 3 other fieldsHigh correlation
survived is highly correlated with sexHigh correlation
age has 174 (20.5%) missing values Missing
cabin has 659 (77.5%) missing values Missing
boat has 542 (63.8%) missing values Missing
body has 777 (91.4%) missing values Missing
home.dest has 386 (45.4%) missing values Missing
name is uniformly distributed Uniform
ticket is uniformly distributed Uniform
cabin is uniformly distributed Uniform
passenger_id has unique values Unique
sibsp has 573 (67.4%) zeros Zeros
parch has 651 (76.6%) zeros Zeros
fare has 11 (1.3%) zeros Zeros
survived has 537 (63.2%) zeros Zeros

Reproduction

Analysis started2022-11-02 10:46:14.203333
Analysis finished2022-11-02 10:46:33.861198
Duration19.66 seconds
Software versionpandas-profiling v3.4.0
Download configurationconfig.json

Variables

passenger_id
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct850
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean662.8164706
Minimum1
Maximum1307
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.8 KiB
2022-11-02T11:46:34.029826image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile59.9
Q1332.25
median676.5
Q3992.25
95-th percentile1240.55
Maximum1307
Range1306
Interquartile range (IQR)660

Descriptive statistics

Standard deviation380.7519362
Coefficient of variation (CV)0.5744454961
Kurtosis-1.20606183
Mean662.8164706
Median Absolute Deviation (MAD)328
Skewness-0.05290275353
Sum563394
Variance144972.0369
MonotonicityNot monotonic
2022-11-02T11:46:34.222985image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12161
 
0.1%
8021
 
0.1%
6751
 
0.1%
11181
 
0.1%
12471
 
0.1%
5671
 
0.1%
7901
 
0.1%
7111
 
0.1%
9851
 
0.1%
11741
 
0.1%
Other values (840)840
98.8%
ValueCountFrequency (%)
11
0.1%
21
0.1%
31
0.1%
41
0.1%
51
0.1%
81
0.1%
101
0.1%
111
0.1%
121
0.1%
131
0.1%
ValueCountFrequency (%)
13071
0.1%
13061
0.1%
13041
0.1%
13031
0.1%
13021
0.1%
13011
0.1%
13001
0.1%
12991
0.1%
12981
0.1%
12941
0.1%

pclass
Real number (ℝ≥0)

HIGH CORRELATION

Distinct3
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.32
Minimum1
Maximum3
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.8 KiB
2022-11-02T11:46:34.418194image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q33
95-th percentile3
Maximum3
Range2
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.8385303201
Coefficient of variation (CV)0.3614354828
Kurtosis-1.260690632
Mean2.32
Median Absolute Deviation (MAD)0
Skewness-0.6586707312
Sum1972
Variance0.7031330978
MonotonicityNot monotonic
2022-11-02T11:46:34.586986image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=3)
ValueCountFrequency (%)
3478
56.2%
1206
24.2%
2166
 
19.5%
ValueCountFrequency (%)
1206
24.2%
2166
 
19.5%
3478
56.2%
ValueCountFrequency (%)
3478
56.2%
2166
 
19.5%
1206
24.2%

name
Categorical

HIGH CARDINALITY
UNIFORM

Distinct849
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
Kelly, Mr. James
 
2
Smyth, Miss. Julia
 
1
Flynn, Mr. John
 
1
Birkeland, Mr. Hans Martin Monsen
 
1
Peltomaki, Mr. Nikolai Johannes
 
1
Other values (844)
844 

Length

Max length82
Median length53
Mean length26.88235294
Min length12

Characters and Unicode

Total characters22850
Distinct characters60
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique848 ?
Unique (%)99.8%

Sample

1st rowSmyth, Miss. Julia
2nd rowCacic, Mr. Luka
3rd rowVan Impe, Mrs. Jean Baptiste (Rosalie Paula Govaert)
4th rowHocking, Mrs. Elizabeth (Eliza Needs)
5th rowVeal, Mr. James

Common Values

ValueCountFrequency (%)
Kelly, Mr. James2
 
0.2%
Smyth, Miss. Julia1
 
0.1%
Flynn, Mr. John1
 
0.1%
Birkeland, Mr. Hans Martin Monsen1
 
0.1%
Peltomaki, Mr. Nikolai Johannes1
 
0.1%
Thorneycroft, Mrs. Percival (Florence Kate White)1
 
0.1%
Stokes, Mr. Philip Joseph1
 
0.1%
Elias, Mr. Joseph1
 
0.1%
Carver, Mr. Alfred John1
 
0.1%
Madsen, Mr. Fridtjof Arne1
 
0.1%
Other values (839)839
98.7%

Length

2022-11-02T11:46:34.787369image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
mr492
 
14.2%
miss171
 
5.0%
mrs124
 
3.6%
william51
 
1.5%
master46
 
1.3%
john45
 
1.3%
henry27
 
0.8%
james24
 
0.7%
thomas23
 
0.7%
george22
 
0.6%
Other values (1466)2428
70.3%

Most occurring characters

ValueCountFrequency (%)
2606
 
11.4%
r1872
 
8.2%
e1634
 
7.2%
a1564
 
6.8%
s1261
 
5.5%
i1238
 
5.4%
n1228
 
5.4%
M1059
 
4.6%
l990
 
4.3%
o972
 
4.3%
Other values (50)8426
36.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter14692
64.3%
Uppercase Letter3475
 
15.2%
Space Separator2606
 
11.4%
Other Punctuation1793
 
7.8%
Close Punctuation136
 
0.6%
Open Punctuation136
 
0.6%
Dash Punctuation12
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r1872
12.7%
e1634
11.1%
a1564
10.6%
s1261
8.6%
i1238
8.4%
n1228
8.4%
l990
 
6.7%
o972
 
6.6%
t651
 
4.4%
h495
 
3.4%
Other values (16)2787
19.0%
Uppercase Letter
ValueCountFrequency (%)
M1059
30.5%
A235
 
6.8%
J214
 
6.2%
H176
 
5.1%
C175
 
5.0%
S174
 
5.0%
E173
 
5.0%
W140
 
4.0%
B134
 
3.9%
L123
 
3.5%
Other values (15)872
25.1%
Other Punctuation
ValueCountFrequency (%)
.851
47.5%
,850
47.4%
"84
 
4.7%
'7
 
0.4%
/1
 
0.1%
Space Separator
ValueCountFrequency (%)
2606
100.0%
Close Punctuation
ValueCountFrequency (%)
)136
100.0%
Open Punctuation
ValueCountFrequency (%)
(136
100.0%
Dash Punctuation
ValueCountFrequency (%)
-12
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin18167
79.5%
Common4683
 
20.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
r1872
 
10.3%
e1634
 
9.0%
a1564
 
8.6%
s1261
 
6.9%
i1238
 
6.8%
n1228
 
6.8%
M1059
 
5.8%
l990
 
5.4%
o972
 
5.4%
t651
 
3.6%
Other values (41)5698
31.4%
Common
ValueCountFrequency (%)
2606
55.6%
.851
 
18.2%
,850
 
18.2%
)136
 
2.9%
(136
 
2.9%
"84
 
1.8%
-12
 
0.3%
'7
 
0.1%
/1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII22850
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2606
 
11.4%
r1872
 
8.2%
e1634
 
7.2%
a1564
 
6.8%
s1261
 
5.5%
i1238
 
5.4%
n1228
 
5.4%
M1059
 
4.6%
l990
 
4.3%
o972
 
4.3%
Other values (50)8426
36.9%

sex
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
male
551 
female
299 

Length

Max length6
Median length4
Mean length4.703529412
Min length4

Characters and Unicode

Total characters3998
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowfemale
2nd rowmale
3rd rowfemale
4th rowfemale
5th rowmale

Common Values

ValueCountFrequency (%)
male551
64.8%
female299
35.2%

Length

2022-11-02T11:46:34.980597image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-02T11:46:35.211818image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
male551
64.8%
female299
35.2%

Most occurring characters

ValueCountFrequency (%)
e1149
28.7%
m850
21.3%
a850
21.3%
l850
21.3%
f299
 
7.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3998
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1149
28.7%
m850
21.3%
a850
21.3%
l850
21.3%
f299
 
7.5%

Most occurring scripts

ValueCountFrequency (%)
Latin3998
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1149
28.7%
m850
21.3%
a850
21.3%
l850
21.3%
f299
 
7.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII3998
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e1149
28.7%
m850
21.3%
a850
21.3%
l850
21.3%
f299
 
7.5%

age
Real number (ℝ≥0)

MISSING

Distinct88
Distinct (%)13.0%
Missing174
Missing (%)20.5%
Infinite0
Infinite (%)0.0%
Mean29.51984719
Minimum0.1667
Maximum80
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.8 KiB
2022-11-02T11:46:35.651006image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0.1667
5-th percentile4
Q120
median28
Q337
95-th percentile57
Maximum80
Range79.8333
Interquartile range (IQR)17

Descriptive statistics

Standard deviation14.56224343
Coefficient of variation (CV)0.4933034829
Kurtosis0.1778536956
Mean29.51984719
Median Absolute Deviation (MAD)8
Skewness0.4582167135
Sum19955.4167
Variance212.0589338
MonotonicityNot monotonic
2022-11-02T11:46:35.946709image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1832
 
3.8%
3030
 
3.5%
2429
 
3.4%
2228
 
3.3%
2526
 
3.1%
3625
 
2.9%
2824
 
2.8%
2124
 
2.8%
2919
 
2.2%
3219
 
2.2%
Other values (78)420
49.4%
(Missing)174
20.5%
ValueCountFrequency (%)
0.16671
 
0.1%
0.41671
 
0.1%
0.66671
 
0.1%
0.751
 
0.1%
0.83333
 
0.4%
0.91671
 
0.1%
14
 
0.5%
210
1.2%
36
0.7%
47
0.8%
ValueCountFrequency (%)
801
 
0.1%
761
 
0.1%
741
 
0.1%
702
0.2%
671
 
0.1%
652
0.2%
643
0.4%
632
0.2%
623
0.4%
613
0.4%

sibsp
Real number (ℝ≥0)

ZEROS

Distinct7
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5223529412
Minimum0
Maximum8
Zeros573
Zeros (%)67.4%
Negative0
Negative (%)0.0%
Memory size6.8 KiB
2022-11-02T11:46:36.124309image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile2
Maximum8
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.112132097
Coefficient of variation (CV)2.129081718
Kurtosis20.01527311
Mean0.5223529412
Median Absolute Deviation (MAD)0
Skewness3.937364703
Sum444
Variance1.236837802
MonotonicityNot monotonic
2022-11-02T11:46:36.336472image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0573
67.4%
1213
 
25.1%
225
 
2.9%
414
 
1.6%
312
 
1.4%
88
 
0.9%
55
 
0.6%
ValueCountFrequency (%)
0573
67.4%
1213
 
25.1%
225
 
2.9%
312
 
1.4%
414
 
1.6%
55
 
0.6%
88
 
0.9%
ValueCountFrequency (%)
88
 
0.9%
55
 
0.6%
414
 
1.6%
312
 
1.4%
225
 
2.9%
1213
 
25.1%
0573
67.4%

parch
Real number (ℝ≥0)

ZEROS

Distinct8
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3823529412
Minimum0
Maximum9
Zeros651
Zeros (%)76.6%
Negative0
Negative (%)0.0%
Memory size6.8 KiB
2022-11-02T11:46:36.495679image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum9
Range9
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.8795111168
Coefficient of variation (CV)2.300259844
Kurtosis26.54875593
Mean0.3823529412
Median Absolute Deviation (MAD)0
Skewness4.05758874
Sum325
Variance0.7735398046
MonotonicityNot monotonic
2022-11-02T11:46:36.666921image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
0651
76.6%
1113
 
13.3%
272
 
8.5%
45
 
0.6%
33
 
0.4%
53
 
0.4%
92
 
0.2%
61
 
0.1%
ValueCountFrequency (%)
0651
76.6%
1113
 
13.3%
272
 
8.5%
33
 
0.4%
45
 
0.6%
53
 
0.4%
61
 
0.1%
92
 
0.2%
ValueCountFrequency (%)
92
 
0.2%
61
 
0.1%
53
 
0.4%
45
 
0.6%
33
 
0.4%
272
 
8.5%
1113
 
13.3%
0651
76.6%

ticket
Categorical

HIGH CARDINALITY
UNIFORM

Distinct660
Distinct (%)77.6%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
CA. 2343
 
10
1601
 
8
S.O.C. 14879
 
6
CA 2144
 
6
PC 17608
 
6
Other values (655)
814 

Length

Max length18
Median length17
Mean length6.742352941
Min length3

Characters and Unicode

Total characters5731
Distinct characters35
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique542 ?
Unique (%)63.8%

Sample

1st row335432
2nd row315089
3rd row345773
4th row29105
5th row28221

Common Values

ValueCountFrequency (%)
CA. 234310
 
1.2%
16018
 
0.9%
S.O.C. 148796
 
0.7%
CA 21446
 
0.7%
PC 176086
 
0.7%
3470826
 
0.7%
1135035
 
0.6%
3826525
 
0.6%
1137815
 
0.6%
1137604
 
0.5%
Other values (650)789
92.8%

Length

2022-11-02T11:46:36.895401image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
pc58
 
5.4%
c.a31
 
2.9%
ca17
 
1.6%
a/517
 
1.6%
234310
 
0.9%
soton/o.q10
 
0.9%
sc/paris10
 
0.9%
w./c9
 
0.8%
16018
 
0.7%
28
 
0.7%
Other values (686)903
83.5%

Most occurring characters

ValueCountFrequency (%)
3742
12.9%
1637
11.1%
2557
9.7%
7455
 
7.9%
4417
 
7.3%
6414
 
7.2%
0397
 
6.9%
5383
 
6.7%
9300
 
5.2%
8273
 
4.8%
Other values (25)1156
20.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4575
79.8%
Uppercase Letter614
 
10.7%
Other Punctuation291
 
5.1%
Space Separator231
 
4.0%
Lowercase Letter20
 
0.3%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C159
25.9%
A91
14.8%
O86
14.0%
P84
13.7%
S63
 
10.3%
N34
 
5.5%
T31
 
5.0%
Q17
 
2.8%
W13
 
2.1%
I10
 
1.6%
Other values (6)26
 
4.2%
Decimal Number
ValueCountFrequency (%)
3742
16.2%
1637
13.9%
2557
12.2%
7455
9.9%
4417
9.1%
6414
9.0%
0397
8.7%
5383
8.4%
9300
6.6%
8273
 
6.0%
Lowercase Letter
ValueCountFrequency (%)
s5
25.0%
a5
25.0%
r4
20.0%
i4
20.0%
l1
 
5.0%
e1
 
5.0%
Other Punctuation
ValueCountFrequency (%)
.203
69.8%
/88
30.2%
Space Separator
ValueCountFrequency (%)
231
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common5097
88.9%
Latin634
 
11.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
C159
25.1%
A91
14.4%
O86
13.6%
P84
13.2%
S63
 
9.9%
N34
 
5.4%
T31
 
4.9%
Q17
 
2.7%
W13
 
2.1%
I10
 
1.6%
Other values (12)46
 
7.3%
Common
ValueCountFrequency (%)
3742
14.6%
1637
12.5%
2557
10.9%
7455
8.9%
4417
8.2%
6414
8.1%
0397
7.8%
5383
7.5%
9300
5.9%
8273
 
5.4%
Other values (3)522
10.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII5731
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3742
12.9%
1637
11.1%
2557
9.7%
7455
 
7.9%
4417
 
7.3%
6414
 
7.2%
0397
 
6.9%
5383
 
6.7%
9300
 
5.2%
8273
 
4.8%
Other values (25)1156
20.2%

fare
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct236
Distinct (%)27.8%
Missing1
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean34.01270094
Minimum0
Maximum512.3292
Zeros11
Zeros (%)1.3%
Negative0
Negative (%)0.0%
Memory size6.8 KiB
2022-11-02T11:46:37.085967image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile7.225
Q17.8958
median14.1083
Q331
95-th percentile134.5
Maximum512.3292
Range512.3292
Interquartile range (IQR)23.1042

Descriptive statistics

Standard deviation53.70577927
Coefficient of variation (CV)1.5789919
Kurtosis26.1104364
Mean34.01270094
Median Absolute Deviation (MAD)6.6125
Skewness4.308892269
Sum28876.7831
Variance2884.310727
MonotonicityNot monotonic
2022-11-02T11:46:37.272601image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1342
 
4.9%
8.0540
 
4.7%
7.7539
 
4.6%
7.895832
 
3.8%
2629
 
3.4%
7.229220
 
2.4%
7.77519
 
2.2%
10.519
 
2.2%
7.854215
 
1.8%
26.5515
 
1.8%
Other values (226)579
68.1%
ValueCountFrequency (%)
011
1.3%
3.17081
 
0.1%
4.01251
 
0.1%
6.23751
 
0.1%
6.43753
 
0.4%
6.451
 
0.1%
6.49581
 
0.1%
6.752
 
0.2%
6.85831
 
0.1%
6.951
 
0.1%
ValueCountFrequency (%)
512.32923
0.4%
2633
0.4%
262.3756
0.7%
247.52081
 
0.1%
227.5254
0.5%
221.77923
0.4%
211.55
0.6%
211.33752
 
0.2%
164.86673
0.4%
153.46252
 
0.2%

cabin
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Distinct135
Distinct (%)70.7%
Missing659
Missing (%)77.5%
Memory size6.8 KiB
B57 B59 B63 B66
 
4
D
 
4
C22 C26
 
4
B96 B98
 
4
G6
 
4
Other values (130)
171 

Length

Max length15
Median length3
Mean length3.832460733
Min length1

Characters and Unicode

Total characters732
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique94 ?
Unique (%)49.2%

Sample

1st rowC82
2nd rowD15
3rd rowC50
4th rowE33
5th rowB57 B59 B63 B66

Common Values

ValueCountFrequency (%)
B57 B59 B63 B664
 
0.5%
D4
 
0.5%
C22 C264
 
0.5%
B96 B984
 
0.5%
G64
 
0.5%
F333
 
0.4%
C23 C25 C273
 
0.4%
A343
 
0.4%
C783
 
0.4%
C1013
 
0.4%
Other values (125)156
 
18.4%
(Missing)659
77.5%

Length

2022-11-02T11:46:37.517538image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
f5
 
2.1%
b574
 
1.7%
c264
 
1.7%
g64
 
1.7%
b984
 
1.7%
b964
 
1.7%
b594
 
1.7%
c224
 
1.7%
b664
 
1.7%
b634
 
1.7%
Other values (142)194
82.6%

Most occurring characters

ValueCountFrequency (%)
C76
10.4%
B69
 
9.4%
268
 
9.3%
363
 
8.6%
157
 
7.8%
656
 
7.7%
550
 
6.8%
44
 
6.0%
438
 
5.2%
833
 
4.5%
Other values (9)178
24.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number453
61.9%
Uppercase Letter235
32.1%
Space Separator44
 
6.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
268
15.0%
363
13.9%
157
12.6%
656
12.4%
550
11.0%
438
8.4%
833
7.3%
930
6.6%
730
6.6%
028
6.2%
Uppercase Letter
ValueCountFrequency (%)
C76
32.3%
B69
29.4%
D31
13.2%
E27
 
11.5%
F12
 
5.1%
A12
 
5.1%
G7
 
3.0%
T1
 
0.4%
Space Separator
ValueCountFrequency (%)
44
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common497
67.9%
Latin235
32.1%

Most frequent character per script

Common
ValueCountFrequency (%)
268
13.7%
363
12.7%
157
11.5%
656
11.3%
550
10.1%
44
8.9%
438
7.6%
833
6.6%
930
6.0%
730
6.0%
Latin
ValueCountFrequency (%)
C76
32.3%
B69
29.4%
D31
13.2%
E27
 
11.5%
F12
 
5.1%
A12
 
5.1%
G7
 
3.0%
T1
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII732
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C76
10.4%
B69
 
9.4%
268
 
9.3%
363
 
8.6%
157
 
7.8%
656
 
7.7%
550
 
6.8%
44
 
6.0%
438
 
5.2%
833
 
4.5%
Other values (9)178
24.3%

embarked
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.4%
Missing1
Missing (%)0.1%
Memory size6.8 KiB
S
589 
C
176 
Q
84 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters849
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowQ
2nd rowS
3rd rowS
4th rowS
5th rowS

Common Values

ValueCountFrequency (%)
S589
69.3%
C176
 
20.7%
Q84
 
9.9%
(Missing)1
 
0.1%

Length

2022-11-02T11:46:37.899001image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-02T11:46:38.128431image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
s589
69.4%
c176
 
20.7%
q84
 
9.9%

Most occurring characters

ValueCountFrequency (%)
S589
69.4%
C176
 
20.7%
Q84
 
9.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter849
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S589
69.4%
C176
 
20.7%
Q84
 
9.9%

Most occurring scripts

ValueCountFrequency (%)
Latin849
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S589
69.4%
C176
 
20.7%
Q84
 
9.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII849
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S589
69.4%
C176
 
20.7%
Q84
 
9.9%

boat
Categorical

HIGH CORRELATION
MISSING

Distinct26
Distinct (%)8.4%
Missing542
Missing (%)63.8%
Memory size6.8 KiB
4
25 
C
24 
13
23 
14
23 
15
 
19
Other values (21)
194 

Length

Max length7
Median length1
Mean length1.487012987
Min length1

Characters and Unicode

Total characters458
Distinct characters15
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)1.6%

Sample

1st row13
2nd row4
3rd row10
4th rowC
5th row8

Common Values

ValueCountFrequency (%)
425
 
2.9%
C24
 
2.8%
1323
 
2.7%
1423
 
2.7%
1519
 
2.2%
1018
 
2.1%
1618
 
2.1%
918
 
2.1%
1116
 
1.9%
316
 
1.9%
Other values (16)108
 
12.7%
(Missing)542
63.8%

Length

2022-11-02T11:46:38.339316image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
c26
 
8.2%
425
 
7.9%
1325
 
7.9%
1423
 
7.3%
1522
 
7.0%
1619
 
6.0%
919
 
6.0%
1018
 
5.7%
d16
 
5.1%
1116
 
5.1%
Other values (10)107
33.9%

Most occurring characters

ValueCountFrequency (%)
1152
33.2%
448
 
10.5%
341
 
9.0%
536
 
7.9%
631
 
6.8%
C26
 
5.7%
220
 
4.4%
919
 
4.1%
018
 
3.9%
D16
 
3.5%
Other values (5)51
 
11.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number393
85.8%
Uppercase Letter57
 
12.4%
Space Separator8
 
1.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1152
38.7%
448
 
12.2%
341
 
10.4%
536
 
9.2%
631
 
7.9%
220
 
5.1%
919
 
4.8%
018
 
4.6%
715
 
3.8%
813
 
3.3%
Uppercase Letter
ValueCountFrequency (%)
C26
45.6%
D16
28.1%
A10
 
17.5%
B5
 
8.8%
Space Separator
ValueCountFrequency (%)
8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common401
87.6%
Latin57
 
12.4%

Most frequent character per script

Common
ValueCountFrequency (%)
1152
37.9%
448
 
12.0%
341
 
10.2%
536
 
9.0%
631
 
7.7%
220
 
5.0%
919
 
4.7%
018
 
4.5%
715
 
3.7%
813
 
3.2%
Latin
ValueCountFrequency (%)
C26
45.6%
D16
28.1%
A10
 
17.5%
B5
 
8.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII458
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1152
33.2%
448
 
10.5%
341
 
9.0%
536
 
7.9%
631
 
6.8%
C26
 
5.7%
220
 
4.4%
919
 
4.1%
018
 
3.9%
D16
 
3.5%
Other values (5)51
 
11.1%

body
Real number (ℝ≥0)

MISSING

Distinct73
Distinct (%)100.0%
Missing777
Missing (%)91.4%
Infinite0
Infinite (%)0.0%
Mean165.8219178
Minimum4
Maximum328
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.8 KiB
2022-11-02T11:46:38.551214image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum4
5-th percentile15.2
Q175
median166
Q3260
95-th percentile310.2
Maximum328
Range324
Interquartile range (IQR)185

Descriptive statistics

Standard deviation99.06848676
Coefficient of variation (CV)0.5974390362
Kurtosis-1.269968259
Mean165.8219178
Median Absolute Deviation (MAD)94
Skewness0.03894268679
Sum12105
Variance9814.565068
MonotonicityNot monotonic
2022-11-02T11:46:38.787003image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1551
 
0.1%
811
 
0.1%
2341
 
0.1%
71
 
0.1%
2691
 
0.1%
751
 
0.1%
2361
 
0.1%
1081
 
0.1%
3271
 
0.1%
961
 
0.1%
Other values (63)63
 
7.4%
(Missing)777
91.4%
ValueCountFrequency (%)
41
0.1%
71
0.1%
91
0.1%
141
0.1%
161
0.1%
191
0.1%
321
0.1%
371
0.1%
381
0.1%
451
0.1%
ValueCountFrequency (%)
3281
0.1%
3271
0.1%
3141
0.1%
3121
0.1%
3091
0.1%
3071
0.1%
3061
0.1%
3051
0.1%
3041
0.1%
2951
0.1%

home.dest
Categorical

HIGH CARDINALITY
MISSING

Distinct272
Distinct (%)58.6%
Missing386
Missing (%)45.4%
Memory size6.8 KiB
New York, NY
 
36
Cornwall / Akron, OH
 
7
London
 
7
Wiltshire, England Niagara Falls, NY
 
6
Montreal, PQ
 
6
Other values (267)
402 

Length

Max length50
Median length38.5
Mean length19.29310345
Min length5

Characters and Unicode

Total characters8952
Distinct characters55
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique180 ?
Unique (%)38.8%

Sample

1st rowCroatia
2nd rowCornwall / Akron, OH
3rd rowBarre, Co Washington, VT
4th rowFinland / Washington, DC
5th rowElkins Park, PA

Common Values

ValueCountFrequency (%)
New York, NY36
 
4.2%
Cornwall / Akron, OH7
 
0.8%
London7
 
0.8%
Wiltshire, England Niagara Falls, NY6
 
0.7%
Montreal, PQ6
 
0.7%
Sweden Winnipeg, MN6
 
0.7%
Philadelphia, PA6
 
0.7%
Brooklyn, NY5
 
0.6%
Bryn Mawr, PA4
 
0.5%
Ireland New York, NY4
 
0.5%
Other values (262)377
44.4%
(Missing)386
45.4%

Length

2022-11-02T11:46:39.029939image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ny104
 
7.1%
103
 
7.0%
new72
 
4.9%
york67
 
4.5%
england65
 
4.4%
pa30
 
2.0%
london26
 
1.8%
sweden25
 
1.7%
nj21
 
1.4%
il20
 
1.4%
Other values (361)942
63.9%

Most occurring characters

ValueCountFrequency (%)
1014
 
11.3%
n651
 
7.3%
o576
 
6.4%
a547
 
6.1%
e543
 
6.1%
,535
 
6.0%
r482
 
5.4%
l362
 
4.0%
i346
 
3.9%
t316
 
3.5%
Other values (45)3580
40.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5532
61.8%
Uppercase Letter1748
 
19.5%
Space Separator1014
 
11.3%
Other Punctuation641
 
7.2%
Dash Punctuation17
 
0.2%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N262
15.0%
Y177
 
10.1%
S123
 
7.0%
M115
 
6.6%
C108
 
6.2%
P102
 
5.8%
A101
 
5.8%
I95
 
5.4%
B86
 
4.9%
E85
 
4.9%
Other values (15)494
28.3%
Lowercase Letter
ValueCountFrequency (%)
n651
11.8%
o576
10.4%
a547
9.9%
e543
9.8%
r482
 
8.7%
l362
 
6.5%
i346
 
6.3%
t316
 
5.7%
d268
 
4.8%
s214
 
3.9%
Other values (14)1227
22.2%
Other Punctuation
ValueCountFrequency (%)
,535
83.5%
/104
 
16.2%
'1
 
0.2%
?1
 
0.2%
Space Separator
ValueCountFrequency (%)
1014
100.0%
Dash Punctuation
ValueCountFrequency (%)
-17
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin7280
81.3%
Common1672
 
18.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
n651
 
8.9%
o576
 
7.9%
a547
 
7.5%
e543
 
7.5%
r482
 
6.6%
l362
 
5.0%
i346
 
4.8%
t316
 
4.3%
d268
 
3.7%
N262
 
3.6%
Other values (39)2927
40.2%
Common
ValueCountFrequency (%)
1014
60.6%
,535
32.0%
/104
 
6.2%
-17
 
1.0%
'1
 
0.1%
?1
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII8952
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1014
 
11.3%
n651
 
7.3%
o576
 
6.4%
a547
 
6.1%
e543
 
6.1%
,535
 
6.0%
r482
 
5.4%
l362
 
4.0%
i346
 
3.9%
t316
 
3.5%
Other values (45)3580
40.0%

survived
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3682352941
Minimum0
Maximum1
Zeros537
Zeros (%)63.2%
Negative0
Negative (%)0.0%
Memory size6.8 KiB
2022-11-02T11:46:39.200309image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile1
Maximum1
Range1
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.4826096523
Coefficient of variation (CV)1.310601292
Kurtosis-1.704436332
Mean0.3682352941
Median Absolute Deviation (MAD)0
Skewness0.5473387077
Sum313
Variance0.2329120765
MonotonicityNot monotonic
2022-11-02T11:46:39.368329image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=2)
ValueCountFrequency (%)
0537
63.2%
1313
36.8%
ValueCountFrequency (%)
0537
63.2%
1313
36.8%
ValueCountFrequency (%)
1313
36.8%
0537
63.2%

Interactions

2022-11-02T11:46:31.208426image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:19.926712image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:21.429439image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:22.813681image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:24.325338image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:25.912724image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:27.436464image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:29.299186image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:31.458577image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:20.111416image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:21.598981image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:22.973867image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:24.498158image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:26.069218image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:27.658789image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:29.467481image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:31.737246image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:20.270631image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:21.766267image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:23.142343image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:24.670861image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:26.242105image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:27.895632image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:29.704628image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:31.970094image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:20.437652image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:21.989167image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:23.302027image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:24.859268image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:26.414706image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:28.181447image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:29.921139image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:32.187097image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:20.686000image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:22.158281image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:23.461293image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:25.047858image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:26.603327image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:28.420971image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:30.122803image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:32.376583image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:20.861391image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:22.325624image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:23.744149image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:25.396663image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:26.823141image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:28.663273image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:30.319857image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:32.560944image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:21.021291image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:22.492974image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:23.948302image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:25.598701image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:27.043068image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:28.857024image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:30.537700image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:32.762582image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:21.260473image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:22.652624image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:24.121187image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:25.739881image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:27.200624image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:29.082884image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-02T11:46:31.016087image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Correlations

2022-11-02T11:46:39.569301image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-11-02T11:46:39.932260image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-11-02T11:46:40.136060image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-11-02T11:46:40.320105image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-11-02T11:46:40.512205image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-11-02T11:46:33.036718image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-11-02T11:46:33.354610image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-11-02T11:46:33.595403image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-11-02T11:46:33.788414image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

passenger_idpclassnamesexagesibspparchticketfarecabinembarkedboatbodyhome.destsurvived
012163Smyth, Miss. JuliafemaleNaN003354327.7333NaNQ13NaNNaN1
16993Cacic, Mr. Lukamale38.0003150898.6625NaNSNaNNaNCroatia0
212673Van Impe, Mrs. Jean Baptiste (Rosalie Paula Govaert)female30.01134577324.1500NaNSNaNNaNNaN0
34492Hocking, Mrs. Elizabeth (Eliza Needs)female54.0132910523.0000NaNS4NaNCornwall / Akron, OH1
45762Veal, Mr. Jamesmale40.0002822113.0000NaNSNaNNaNBarre, Co Washington, VT0
510833Olsen, Mr. Henry Margidomale28.000C 400122.5250NaNSNaN173.0NaN0
68983Johnson, Mr. William Cahoone Jrmale19.000LINE0.0000NaNSNaNNaNNaN0
75602Sinkkonen, Miss. Annafemale30.00025064813.0000NaNS10NaNFinland / Washington, DC1
810793Ohman, Miss. Velinfemale22.0003470857.7750NaNSCNaNNaN1
99083Jussila, Miss. Mari Ainafemale21.01041379.8250NaNSNaNNaNNaN0

Last rows

passenger_idpclassnamesexagesibspparchticketfarecabinembarkedboatbodyhome.destsurvived
84012563Touma, Master. Georges Youssefmale7.011265015.2458NaNCCNaNNaN1
8412081Minahan, Mrs. William Edward (Lillian E Thorpe)female37.0101992890.0000C78Q14NaNFond du Lac, WI1
8427093Carr, Miss. Helen "Ellen"female16.0003672317.7500NaNQ16NaNCo Longford, Ireland New York, NY1
84312883Wiklund, Mr. Jakob Alfredmale18.01031012676.4958NaNSNaN314.0NaN0
8441651Hoyt, Mr. Frederick Maxfieldmale38.0101994390.0000C93SDNaNNew York, NY / Stamford CT1
8451581Hipkins, Mr. William Edwardmale55.00068050.0000C39SNaNNaNLondon / Birmingham0
8461741Kent, Mr. Edward Austinmale58.0001177129.7000B37CNaN258.0Buffalo, NY0
8474672Kantor, Mrs. Sinai (Miriam Sternin)female24.01024436726.0000NaNS12NaNMoscow / Bronx, NY1
84811123Peacock, Miss. Treasteallfemale3.011SOTON/O.Q. 310131513.7750NaNSNaNNaNNaN0
8494252Greenberg, Mr. Samuelmale52.00025064713.0000NaNSNaN19.0Bronx, NY0